Data science is an interdisciplinary field that involves the use of statistical and computational methods to extract insights and knowledge from data. It combines techniques and tools from statistics, mathematics, computer science, and domain-specific fields to analyze and interpret complex data sets.
Data science has become a popular career choice for several reasons:
4.8 Star from 612 ratings, 700+ Students Enrolled
Per Sessions 4 Hours, Weekend & Weekday Both Available
CURRICULUM
Data Science in Real Life
- Defining Data Science
- What Does a Data Science Professional Do?
- Data Science in Business
- Use Cases for Data Science
- Data Science People
Basic Analytics
- Data Types
- Basic statistics using data examples
- Central tendencies
- Correlation analysis
- Data Summarization
- Data Dictionary
- Outliers /Missing Values
- Basic Linear Algebra – dot product, matrix multiplication and transformations
Data Wrangling
- Data cleaning
- Data transformation
- Data merging
- Data reshaping
- Data aggregation
- Data validation
- Data imputation
- Data standardization
- Data filtering
Introduction to R
- History and evolution of R
- Principle and software paradigm
- Description of R interface
- Advantages of R
- Drawbacks of R
- So why use R?
- References for learning
Advance Data Manipulation in R (Packages like DPLYR, PLYR, SQLDF, MASS)
- Importing and exporting data from .txt files and .xls-like files
- Advanced data manipulation
- Accessing variables and management of subsets in data
- Working with characters, text and dates
Exploratory Analysis & Data Visualization
- Graphics for basic descriptive statistics
- Graphics for time-related data
- Introduction to more advanced graphics
- Adding relevant information & customize your graphics
Fundamental of Statistics
- Types of Variables, measures of central tendency and dispersion
- Variable Distributions and Probability Distributions
- Normal Distribution and Properties
- Central Limit Theorem and Application
Basic Statistical Analysis
- Statistics Basics Introduction to Data Analytics, descriptive and summary
- Inferential statistics
Statistical Significant Tests
- Hypothesis Testing Null/Alternative Hypothesis formulation
- Z‐Test, T‐Test, Chi‐Square test
- Analysis of Variance (ANOVA)
- Chi Square Test
- Correlation
Data Preparation
- Need for data preparation
- Outlier treatment
- Missing values treatment
- Multicollinearity
Predictive modeling & Time Series Analysis
- Basics of regression analysis
- Linear regression
- Logistic regression
- Interpretation of results
- Multivariate Regression modeling
Machine Learning Algorithm
- Text Analytics
- Random Forest
- Support Vector Machine (SVM)
- Naïve Bayes Algorithm
- K-NN Classification & Regression
Model Optimization
- Overfit vs Underfit
- Bias Variance tradeoff
- Grid Search
- Random Search
- Feature Engg examples
- Ridge / Lasso Regression
- SkLearn Pipelines
- SkLearn Imputers
Big Data Hadoop and Spark Developer
- Introduction to Big Data and Hadoop Ecosystem
- HDFS and Hadoop Architecture
- MapReduce and Sqoop
- Basics of Impala and Hive
- Working with Hive and Impala
- Type of Data Formats
- Advanced HIVE concept and Data File Partitioning
- Apache Flume and HBase
- Apache Pig
- Basics of Apache Spark
- RDDs in Spark
- Implementation of Spark Applications
- Spark Parallel Processing
- Spark RDD Optimization Techniques
- Spark Algorithm
- Spark SQL